Overview

Dataset Statistics

Number of Variables 16
Number of Rows 20275
Missing Cells 15198
Missing Cells (%) 4.7%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 21.3 MB
Average Row Size in Memory 1.1 KB
Variable Types
  • Numerical: 7
  • Categorical: 9

Dataset Insights

Уч_Заведение has 1811 (8.93%) missing values Missing
Где_Находится_УЗ has 2041 (10.07%) missing values Missing
Год_Окончания_УЗ has 1917 (9.45%) missing values Missing
Страна_ПП has 507 (2.5%) missing values Missing
Регион_ПП has 908 (4.48%) missing values Missing
Город_ПП has 657 (3.24%) missing values Missing
Страна_Родители has 656 (3.24%) missing values Missing
Статус has 6691 (33.0%) missing values Missing
ID is skewed Skewed
Год_Поступления is skewed Skewed
Год_Окончания_УЗ is skewed Skewed
КодФакультета is skewed Skewed
СрБаллАттестата is skewed Skewed
test is skewed Skewed
Уч_Заведение has a high cardinality: 4709 distinct values High Cardinality
Где_Находится_УЗ has a high cardinality: 2710 distinct values High Cardinality
Регион_ПП has a high cardinality: 240 distinct values High Cardinality
Город_ПП has a high cardinality: 2244 distinct values High Cardinality
Пол has constant length 3 Constant Length
Основания has constant length 2 Constant Length
test has 13584 (67.0%) negatives Negatives
  • 1
  • 2
  • 3

Variables


ID

numerical

Approximate Distinct Count 20275
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 324400
Mean 70131.0307
Minimum 44602
Maximum 264403
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • ID is skewed right (γ1 = 0.2237)

Quantile Statistics

Minimum 44602
5-th Percentile 48456.7
Q1 63537.5
Median 71393
Q3 78383.5
95-th Percentile 88329.3
Maximum 264403
Range 219801
IQR 14846

Descriptive Statistics

Mean 70131.0307
Standard Deviation 12724.7041
Variance 1.6192e+08
Sum 1.4219e+09
Skewness 0.2237
Kurtosis 4.7112
Coefficient of Variation 0.1814
  • ID is not normally distributed (p-value 8.946464152258215e-08)
  • ID has 88 outliers

Код_группы

numerical

Approximate Distinct Count 4233
Approximate Unique (%) 20.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 324400
Mean 18305.2407
Minimum 11550
Maximum 22824
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Код_группы is skewed left (γ1 = -0.5361)

Quantile Statistics

Minimum 11550
5-th Percentile 13627
Q1 16899
Median 18536
Q3 20599
95-th Percentile 21851
Maximum 22824
Range 11274
IQR 3700

Descriptive Statistics

Mean 18305.2407
Standard Deviation 2534.799
Variance 6.4252e+06
Sum 3.7114e+08
Skewness -0.5361
Kurtosis -0.4266
Coefficient of Variation 0.1385

Год_Поступления

numerical

Approximate Distinct Count 20
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 324400
Mean 2014.9921
Minimum 2001
Maximum 2212
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Год_Поступления is skewed right (γ1 = 27.2891)

Quantile Statistics

Minimum 2001
5-th Percentile 2012
Q1 2013
Median 2015
Q3 2016
95-th Percentile 2018
Maximum 2212
Range 211
IQR 3

Descriptive Statistics

Mean 2014.9921
Standard Deviation 2.4017
Variance 5.7682
Sum 4.0854e+07
Skewness 27.2891
Kurtosis 2231.4602
Coefficient of Variation 0.001192
  • Год_Поступления is not normally distributed (p-value 1.7521867835092935e-21)
  • Год_Поступления has 15 outliers

Пол

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 10
Missing (%) 0.0%
Memory Size 2087295

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row Жен
2nd row Муж
3rd row Жен
4th row Жен
5th row Жен

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Жен, Муж) take over 50.0%
  • Пол has words of constant length

Основания

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2007225

Length

Mean 2
Standard Deviation 0
Median 2
Minimum 2
Maximum 2

Sample

1st row ОО
2nd row ЦН
3rd row ДН
4th row БН
5th row БН

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (ОО, СН) take over 50.0%
  • Основания has words of constant length

Уч_Заведение

categorical

Approximate Distinct Count 4709
Approximate Unique (%) 25.5%
Missing 1811
Missing (%) 8.9%
Memory Size 4376189

Length

Mean 37.8871
Standard Deviation 21.1705
Median 40
Minimum 2
Maximum 177

Sample

1st row МБОУ "СОШ №59"
2nd row МБОУ Алтайская СОШ...
3rd row ФГБОУ ВО Алтайский...
4th row ФГБОУ ВО Алтайский...
5th row МБОУ "СОШ №110"

Letter

Count 26
Lowercase Letter 1
Space Separator 68859
Uppercase Letter 25
Dash Punctuation 1843
Decimal Number 13613
  • Уч_Заведение contains many words: 3115 words

Где_Находится_УЗ

categorical

Approximate Distinct Count 2710
Approximate Unique (%) 14.9%
Missing 2041
Missing (%) 10.1%
Memory Size 3669190

Length

Mean 28.7842
Standard Deviation 10.6152
Median 26
Minimum 2
Maximum 79

Sample

1st row Алтайский край, Ба...
2nd row Алтайский край, Ал...
3rd row Алтайский край, г....
4th row Алтайский край, г....
5th row Алтайский край, Ба...

Letter

Count 4
Lowercase Letter 4
Space Separator 57927
Uppercase Letter 0
Dash Punctuation 4551
Decimal Number 17
  • Где_Находится_УЗ contains many words: 1724 words

Год_Окончания_УЗ

numerical

Approximate Distinct Count 45
Approximate Unique (%) 0.2%
Missing 1917
Missing (%) 9.5%
Infinite 0
Infinite (%) 0.0%
Memory Size 293728
Mean 2013.8512
Minimum 1966
Maximum 2020
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Год_Окончания_УЗ is skewed left (γ1 = -3.0195)

Quantile Statistics

Minimum 1966
5-th Percentile 2005
Q1 2013
Median 2015
Q3 2016
95-th Percentile 2018
Maximum 2020
Range 54
IQR 3

Descriptive Statistics

Mean 2013.8512
Standard Deviation 4.3156
Variance 18.6246
Sum 3.697e+07
Skewness -3.0195
Kurtosis 13.3964
Coefficient of Variation 0.002143
  • Год_Окончания_УЗ is not normally distributed (p-value 1.467499253521647e-09)
  • Год_Окончания_УЗ has 1451 outliers

Страна_ПП

categorical

Approximate Distinct Count 26
Approximate Unique (%) 0.1%
Missing 507
Missing (%) 2.5%
Memory Size 2285241
  • The largest value (Россия) is over 30.48 times larger than the second largest value (Казахстан)

Length

Mean 6.1511
Standard Deviation 0.8191
Median 6
Minimum 5
Maximum 22

Sample

1st row Россия
2nd row Россия
3rd row Россия
4th row Россия
5th row Россия

Letter

Count 0
Lowercase Letter 0
Space Separator 23
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Россия, Казахстан) take over 50.0%
  • The largest value (россия) is over 29.81 times larger than the second largest value (казахстан)

Регион_ПП

categorical

Approximate Distinct Count 240
Approximate Unique (%) 1.2%
Missing 908
Missing (%) 4.5%
Memory Size 2854014
  • The largest value (Алтайский край) is over 56.81 times larger than the second largest value (Республика Алтай)

Length

Mean 14.3513
Standard Deviation 2.2619
Median 14
Minimum 3
Maximum 43

Sample

1st row Алтайский край
2nd row Алтайский край
3rd row Алтайский край
4th row Алтайский край
5th row Алтайский край

Letter

Count 0
Lowercase Letter 0
Space Separator 19366
Uppercase Letter 0
Dash Punctuation 574
Decimal Number 0
  • The top 2 categories (Алтайский край, Республика Алтай) take over 50.0%

Город_ПП

categorical

Approximate Distinct Count 2244
Approximate Unique (%) 11.4%
Missing 657
Missing (%) 3.2%
Memory Size 2536631
  • The largest value (Барнаул г) is over 2.15 times larger than the second largest value (г. Барнаул)

Length

Mean 9.9033
Standard Deviation 2.8418
Median 9
Minimum 1
Maximum 53

Sample

1st row Барнаул г
2nd row Барнаул г
3rd row Алтайское с
4th row г. Барнаул
5th row г. Барнаул

Letter

Count 8
Lowercase Letter 8
Space Separator 17473
Uppercase Letter 0
Dash Punctuation 1259
Decimal Number 33
  • Город_ПП contains many words: 1567 words

Страна_Родители

categorical

Approximate Distinct Count 23
Approximate Unique (%) 0.1%
Missing 656
Missing (%) 3.2%
Memory Size 2270257
  • The largest value (Россия) is over 24.32 times larger than the second largest value (Казахстан)

Length

Mean 6.1795
Standard Deviation 0.8761
Median 6
Minimum 3
Maximum 22

Sample

1st row Россия
2nd row Россия
3rd row Россия
4th row Россия
5th row Россия

Letter

Count 0
Lowercase Letter 0
Space Separator 12
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Россия, Казахстан) take over 50.0%
  • The largest value (россия) is over 24.26 times larger than the second largest value (казахстан)

КодФакультета

numerical

Approximate Distinct Count 20
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 324400
Mean 32.5201
Minimum 24
Maximum 53
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • КодФакультета is skewed right (γ1 = 0.9322)

Quantile Statistics

Minimum 24
5-th Percentile 24
Q1 26
Median 28
Q3 40
95-th Percentile 51
Maximum 53
Range 29
IQR 14

Descriptive Statistics

Mean 32.5201
Standard Deviation 8.4766
Variance 71.8531
Sum 659345
Skewness 0.9322
Kurtosis -0.3716
Coefficient of Variation 0.2607
  • КодФакультета is not normally distributed (p-value 1.4511536337759456e-11)

СрБаллАттестата

numerical

Approximate Distinct Count 527
Approximate Unique (%) 2.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 324400
Mean 72.7144
Minimum 0
Maximum 7232
Zeros 11
Zeros (%) 0.1%
Negatives 0
Negatives (%) 0.0%
  • СрБаллАттестата is skewed right (γ1 = 16.8529)

Quantile Statistics

Minimum 0
5-th Percentile 4
Q1 47
Median 61
Q3 75
95-th Percentile 93
Maximum 7232
Range 7232
IQR 28

Descriptive Statistics

Mean 72.7144
Standard Deviation 255.538
Variance 65299.6729
Sum 1.4743e+06
Skewness 16.8529
Kurtosis 293.8825
Coefficient of Variation 3.5143
  • СрБаллАттестата is not normally distributed (p-value 4.226726831628287e-25)
  • СрБаллАттестата has 2537 outliers

Статус

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 6691
Missing (%) 33.0%
Memory Size 924326
  • The largest value (4.0) is over 1.75 times larger than the second largest value (3.0)

Length

Mean 3.0452
Standard Deviation 0.2078
Median 3
Minimum 3
Maximum 4

Sample

1st row 3.0
2nd row 4.0
3rd row 4.0
4th row 4.0
5th row 4.0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 614
Decimal Number 27168
  • The top 2 categories (4.0, 3.0) take over 50.0%
  • The largest value (40) is over 1.75 times larger than the second largest value (30)

test

numerical

Approximate Distinct Count 6692
Approximate Unique (%) 33.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 324400
Mean 1103.2213
Minimum -1
Maximum 6690
Zeros 1
Zeros (%) 0.0%
Negatives 13584
Negatives (%) 67.0%
  • test is skewed right (γ1 = 1.5549)

Quantile Statistics

Minimum -1
5-th Percentile -1
Q1 -1
Median -1
Q3 1621.5
95-th Percentile 5676.3
Maximum 6690
Range 6691
IQR 1622.5

Descriptive Statistics

Mean 1103.2213
Standard Deviation 1925.3076
Variance 3.7068e+06
Sum 2.2368e+07
Skewness 1.5549
Kurtosis 0.9831
Coefficient of Variation 1.7452
  • test is not normally distributed (p-value 4.226522723833531e-25)
  • test has 2635 outliers

Interactions

Correlations

Missing Values